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Performance  hounds  of  a class  of  sample-based  classification  procedures 
using  the  k-nearest-neigibor  rule  (k-NNR)  are  considered  in  this  paper.  By 
using  k-NNR  for  decision,  we  show  that  the  lower  bovmds  of  the  probability 
of  correct  decision  are  very  close  to  that  obtained  with  the  Bayes  linear 
discriminant  analysis  based  on  the  assunptlon  of  two  multivariate  Gaussian 
densities  with  different  mean  vectors  but  equal  covariance  matrices.  This 
surprisingly  good  result  suggests  that  the  nonparametrlc  method  is  very 
effective  at  small  saiqple  size  situation  which  is  of  much  practical 
significance.  By  using  the  k-NNR  for  density  estimates,  an  upper  bound 
of  the  probability  of  correct  decision  provides  an  optimistic  estimate 
of  the  performance  which  again  indicates  the  effectiveness  of  the  * \ 
nonparametrlc  technique. 
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I . Introduction 

This  paper  is  concerned  with  the  performance  bounds  of  a class  of  sample-based 
classification  procedures  using  the  k -nearest -neighbor  rule  in  decision  or  density 
estimate.  In  particular  we  show  that  the  lower  bounds  of  the  probability  of 
correct  decision  are  very  close  to  that  obtained  by  linear  discriminant  analysis 
based  on  the  assumption  of  two  multivariate  Gaussian  densities  with  different 
mean  vectors  but  equal  covariance  matrices.  This  surprisingly  good  result  makes 
the  nonparametric  methods  very  attractive  st  small  sample  size  which  is  the  case 
in  many  applications  such  as  pattern  recognition,  operational  research,  quality 
control  and  related  computer  science  areas. 

II.  The  k-Nearest-Neighbor  Classification  Rule 

Consider  two  hypotheses  and  Hg.  Let  k be  the  total  number  of  nearest- 

neighbors  considered.  The  conditional  error  of  the  k-nearest -neighbor  rule 

(k-HNR)  for  given  X is  given  by  Eq.  6-70  of  Fukunaga  tl], 

(k-l) 

2 

rt(x)  * r*(X)  l ( k ) r^X^  [1  - r*(X)]k“’3 
J=o  J 

k 

+ [1  - r*(X)3  f ( * ) r»(JC!J  tl  - r«(X)Ik-J  (l) 

Jkl)  J 

. o 2 

where  the  first  and  the  second  terms  are  the  conditional  errors  for  XeH^  and 
XeHg  respectively,  and  r*(X)  is  the  Bayes  conditional  error, 
r*(X)  « min.  [Pt^/X),  P(Hg/X)] 

It  can  he  shown  that  Eq.  (l)  is  a concave  function  of  r*(X).  For  example, 
let  k ■ 3, 

rk(X)  » r*(X)(l  - r*(X))  <1  + Ur*(X)  - l*r*(X)2} 


r 
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vhich  is  always  greater  than  r*(x)  and  is  nonotonically  increasing  with  r*(x) 


for  0 <_  r*(X)  £ I*.  rk(X)  is  also  symmetric  with  respect  to  r*(x).  The  concave 


property  holds  in  general  for  any  k.  Figure  1 is  a typical  plot  of  r.  (X)  versus 

& 

r*(X)  for  k ■ 3. 


1 


Taking  expectation  on  both  sides  of  Eq.  (l)  we  obtain  the  average  or  unconditional 
error. 


k-1 

re 


rk  = E[r  (X)]  - l ( k ) [E(r»(X))]*,+1  [1  - E(r«(X))]k~J 
J*o  J 


l ( k ) tE(r*(X))]J  [1  - E(r»(X))]k”J 
k+1  J 


(2) 


Here  the  Bsyes  error  E[r*(X)]  is  unknown,  however.  By  assuming  multivariate 
Gaussian  densities  for  the  measurement, 

p(X/H1)  = n^,  E),  p(X/Hg)  * n(y2,  E) 


an  estimate  of  the  Bayes  error  is  given  by  (see  e.g.  [2]) 

ro 

e_t  ^2  dt 

■ 

Jt 


and  D2  * - XgJ'S"1^  - Xg) 

is  the  estimated  MahalanoHs  distance  between  the  two  populations.  Here  end 

JL  are  the  sample  means  based  on  sample  sizes  n.  and  n_  respectively  and 
n^  1 

s » (Xt  - XL)(Xi  - ^ l 1 (XA  - Sg)(Xl  - ^)»)/(n  - 2) 
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is  the  sanple  estimate  of  the  common  covariance  matrix.  J.  n = n^  ♦ ng  is  the 

A 

total  nunber  of  training  (labelled)  samples.  Let  rk  he  the  sample  estimate  of 
the  average  error. 


*1*2 
rk  ■ * rk 


where  k-1 

r.1  * f ( * ) G(-  |)J+1  [1  - G(-  |)]k_J 

* J-o  J d 

£k8  • ! c i > «c-  |)J  u - <k- 

k+1  ” £ 

J*  2 

Table  1 is  a tabulation  of  D,  r^1,  rk2,  and  rk-  As  indicated  by  Bq.  (2),  rfc  is  an 

A 

upper  bound  of  the  error  probability  using  k-NNR.  ■ 1 - rk  as  plotted  in 
Fig.  2 is  the  lower  bound  of  the  probability  of  correct  decision. 

Table  1 


2 


0.255  0.163  0.022  1.3x10* 


5.4x1O'0 


3.35x10  ' 


6.5X10”25 


022  1.3xl0"3  3.35x10**'’ 


A proper  choice  of  k depends  on  the  saaple  size  n and  the  dimension  p of  the 
vector  measurement.  For  the  Gaussian  assumption  of  densities,  the  relationship 
as  given  by  Fukmaga  and  Hostetler  [3]  is  tabulated  in  Table  2. 

Table  2 


kopt. 

0.75n1/2 

0.9Un1/3  i 0.62n1/5 

i 

0.U2n1/9 

It  Is  interesting  to  note  that  the  above  results  is  consistent  with  the  eorputer 
simulation  result  reported  by  Goldstein  P»],  which  shows  that  for  p ■ 2,  k is 
proportional  to  na  with  fe  >_  0.5. 


Ibr  comparison  purpose  we  shall  now  consider  the  performance  of  linear 


discriminant  function. 

III.  The  Linear  Discriminant  Analysis 

With  multivariate  Gaussian  assumption  as  described  in  the  previous  section, 

. estimates 

the  sasple  estimate  of  the  error  probability  0(-  j)  is  one  of  several  error /which 

have  been  examined  ([2],  [5]).  G(-  j)  in  fact  provides  only  a lower  bound  of  the 

error  probability  as  can  be  seen  below. 

Me  Lachlan  (2]  has  given  the  following  expectation  of  sample  error  estimate, 

E[G(-  §)]  ■ 0(-  |)  ♦ Bx  ♦ Bg‘  (*> 

2 

where  A is  the  true  Mahalanobis  distance  when  both  mean  vectors  and  covariance 
matrix  are  known  exactly  and  B1  and  are  given  by 

B1  * TST  HA’  A (i-  ♦ j-  ) ♦ | <A2  -4(2p  ♦ 1)}  1 


B, 


• vss-  (IA{A*  - U(2p  4 1)}  + -2r1}  «p  - 1)  ♦ 


♦ {A5  - l»(3p  ♦ T)A3  + l6(2p2  + 8p  + 5)A  - 6M?“  i-  )(-!=) 


“l  “2 


+ il  {3&6  ’ 4(l2p  4 35)*U  4 l6(l2p2  * T2P  + ‘H-JA2 

- 192 (l2p2  ♦ I2p  ♦ 1)>  

2 

where  g(y)  ■ -A-  e’y  ^2.  Both  B^  and  Bg  approach  0 as  n^,  n2  ♦ • • B^  obviously 

/to 

is  not  good  for  small  A.  A sample  calculation  fbr  n^  ■ n^  • 5 and  p ■ 2 gives 
B1  ■ - 0.030,  Bg  ■ - 0.00118  at  A ■ 2,  and  Bx  • - 0.000675  and  Bg  « 0 at  A ■ **. 
Since  B1  and  Bg  are  negative  or  otherwise  negligibly  small,  E [0(-  g-)J  la  thus 
smaller  than  its  true  value.  1 - G(-  ^-)  is  also  plotted  in  Figure  2.  Due  close 
proximity  of  the  two  curves  dearly  indicates  that  the  kvtfRR  can  provide  a 
performance  very  dose  to  tluj  Bayes  decision  rule  under  Gaussian  assumption. 
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IV.  Classification  Based  on  the  k -Nearest -Neighbor  Density  Estimates 

Let  k^  and  kg  be  the  numbers  of  nearest  neighbors  from  populations  and  Hg 
respectively;  k = + kg.  Let  be  the  distance  between  X and  its  kth  nearest- 

nei^ibor,  nearness  being  measured  by  any  convenient  metric  d(X,  Y).  Let  S^X)  be 
the  region  about  X containing  its  k-NN, 

S^X)  = {Y:  d^X,  Y)  < d^},  i * 1,2 


•where  the  index  refers  to  the  hypothesis  considered.  Also  let  v^(x)  be  the 


volume  of  this  region, 
v^X)  = | df 


si(x) 


The  k-NR  estimate  of  the  probability  density  is 


„ k,  - 1 

p,(x)  - =VxxT 


“iV 


We  can  now  define  new  test  statistics  as 
. _ki-1 


nn. 


n 

l 


1=1 


t .hzl 

2 nn« 


n 

l 


i=l  v2(XiJ 


(5) 


end  the  decision  rule  is  to  accept  hypothesis  if  t^  tg,  otherwise  accept 
hypothesis  Hg.  Also  define  the  coverage  u^(X)  as 


u±(X)  = j 


p,(Y)dY;  p,(Y)  = p(Y/H.) 


s1(s) 


It  has  been  shown  [3]  that 
1 


viT XT 


P^X) 


u^Tx)  + 


Ci(x) 


Pj(X) 


2/p  1-2/p 

ui 


(X) 


where 


1 2/t>  o + P ^1  ~ ^ 3 Pj(x) 

ci(x)  - r I jp  1) 


where  is  the  transformation  matrix  for 


d^tt,  Y)  * (Y  - Xj'A^Y  - X), 


Now  the  mean  and  variance  of  can  be  determined  as: 
k — 1 

l± m EtV  8 — {irrr  pi(x)  + (rrr)1’2/p 

k - l (6) 

= Ett  - c.]2 « — )2  g-U. -•*•■  ? - k)  ■ Pj (X) 

*i  1 1 ni  (k-l)2(k-2)  1 

It  is  noted  that  the  mean  and  variance  are  functions  of  X because  the  expectation 

was  taken  with  respect  to  u instead  of  X.  Assuming  that  t^  is  univariate  Gaussian 

distributed,  the  probability  of  correct  descision,  for  given  X,  is  given  by 


Prob.  > tglH^X) 


- f if  -i 

J ! J V5T. 


(y-V 


— i—  e % 
j/&n  <t 

t2  t1 


. L 


(t2-t'2)2 


2 dt. 


fir 


£_-t,+<J  (D 

2 1 x2 


-z2/2 

e dz 


I V/2tt 


Although  the  average  probability  of  correct  decision  may  be  obtained  by  teking 
the  expectation  of  Eq.  (7)  with  respect  to  X,  the  resulting  expression  is 
difficult  to  evaluate.  If,  however,  we  assume  both  p^(X)  and  P2(X)  are  multivariate 
Gaussian  with  sample  estimates  of  means  and  common  covariance  matrix  as  described 
in  previous  section,  then  an  upper  bound  of  the  probability  of  correct  decision 
may  be  obtained  by  eval listing  Eq.  (7)  at  X = 5^.  By  neglecting  the  second  term 
for  in  Eq.  (6),  a sample  plot  of  the  probability  bound  of  correct  decision 
versus  the  estimated  Mahalanobis  distance  is  shown  in  Fig.  3. 


1. 


2. 


3. 


k. 


5. 
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